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Abstract. Regularization plays a key role in a variety of optimization formulations of inverse 
problems. A recurring question in regularization approaches is the selection of regularization pa- 
rameters, and its effect on the solution and on the optimal value of the optimization problem. The 
sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange 
multipliers. In this paper, we fully characterize the variational properties of the value functions for 
a broad class of convex formulations, which are not all covered by standard Lagrange multiplier 
theory. We also present an inverse function theorem that links the value functions of different 
regularization formulations (not necessarily convex). These results have implications for the selection 
of regularization parameters, and the development of specialized algorithms. We give numerical 
examples that illustrate the theoretical results. 



1. Introduction. It is well known that there is a close connection between the 
sensitivity of the optimal value of a parametric optimization problem and its Lagrange 
multipliers. Consider the family of convex optimization problems 

P(b, t) minimize p(r) 

r,x 

subject to Ax + r = b : u, 
4>(x) < t : fi, 

parameterized by r > inf <f) and b £ K m , where A £ M. mxn , and the functions 
<fr : R™ — > E := (—00,00] and p : R m — > K are closed, proper and convex, and 
continuous relative to their domains. The value function 

v(b,r) = MP(b,r) 

gives the optimal objective value of P(b, r) for fixed parameters b and r. If P(b, r) is a 
feasible ordinary convex program (cf. [28| Section 28]), then under standard hypotheses 
the subdifferential of v is 



dv(b,T)={( U ,p J )}, (1.1) 

where u £ M. m and p £ M. are the Lagrange multipliers of P(b,r), shown next to 
their constraints. This connection is extensively explored in Rockafeller's 1993 survey 
paper 



27 



If we allow 4> to take on infinite values on the domain of the objective — which can 
occur, for example, if <f> is an arbitrary gauge — then P(b, r) is no longer an ordinary 
convex program, and the standard Lagrange multiplier theory does not apply. While in 
some cases the problem can be remodeled to overcome this difficulty, we are interested 
in developing an extended Lagrange multiplier theory that avoids the need for such 
reformulation, and captures a wide range of data-fitting applications. Remarkably, 
even in this general setting, it is also possible to obtain explicit formulas of the 
subdifferential of the value function of P(b, r), useful in many applications. 
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1.1. Examples. We give two simple examples illustrating the need for the ex- 
tended Lagrange multiplier theory. Both are of the form 



minimize 



^\\Ax-b\\ 2 subject to 7 (x | U) < 1, (1.2) 



where 

7(2: I U) := inf { A > | x G AC/} 

is the gauge function for the closed nonempty convex set U C M™, which contains 0. 
Let A = I and b = (0,-l) T . 

For our first example, we consider the set 

U = {xeR 2 \x\ < x 2 \ , 



defined in 28 Section 10]. The gauge for this set is an example of a closed proper and 
convex function that is not continuous at a boundary point of its effective domain. It 
is straightforward to show that 

7 I U) = \ 0, x x = = x 2 
00, otherwise. 



The constraint region for (1.2) is the set U and the unique global solution is the point 
x = 0. However, since = 7 (0 | U) < 1, the classical Lagrange multiplier theory fails: 
the solution is on the boundary of the feasible region, and yet no Lagrange multiplier 
exists. The problem is that the constraint is active at the solution, but not active in 
the functional sense, i.e., 7 (0 | U) < 1. In contrast, the extended multiplier theory of 
Theorem |5.2| succeeds with the multiplier choice of 0. 

For the second example, take U = M 2 I~)K , where B 2 is the unit ball associated with 
the Euclidean norm on R 2 , and set K = { (x 1 , x 2 ) \ x 2 > }. Then 7 (x \ B 2 n K) = 



\\x\\ 2 + S (x I K), and the constraint region for ( |1.2[ ) is the set B 2 n K. Again, the 
origin is the unique global solution to this optimization problem, and no classical 



Lagrange multiplier for this problem exists. The multiplier theory of Theorem 5.2 
succeeds, again with the multiplier choice of 0. 

1.2. Formulations. Appropriate definitions of the functions p and <fi can be used 
to represent a range of practical problems. Choosing p to be the 2-norm and </> to be 
any norm yields the canonical regularized least-squares problem 

minimize ||HU subject to Ax + r = b, \\x\\ < t, (1-3) 

X 

which optimizes the misfit between the data b and the forward model Ax, subject to 
keeping x appropriately bounded in some norm. Interestingly, when the optimal resid- 
ual r is nonzero, the value function for this family of problems is always differentiable 



in both b and r — i.e., the subdifferential in (1.1) is a singleton. In particular 



\A T r 



where || • ||„ is the norm dual to || • ||. The 2-norm constraint on x yields a Tikhonov 
regularization, popular in many inversion applications. A 1-norm constraint on x yields 
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Fig. 1.1. The misfit p(r) (solid line) and its derivative (dashed line) as a function of the 
regularization parameter for a 1-norm regularized example. The left panel shows the constrained 
formulation P(b, r), which varies smoothly with t; the right panel shows that the penalized formulation 
does not vary smoothly with A (note the reversed axis). 



the Lasso problem 33 , often used in sparse recovery and model-selection applications. 
The gradient (1.4) is derived by van den Berg and Friedlander [jjj. The analysis of the 
sensitivity in r of the value function for the Lasso problem led to the development of the 
SPGL1 solver [5], currently used in a variety of sparse inverse problems, with particular 
success in large-scale sparse inverse problems [2l]. A subsequent analysis [8] that 
allows <j>(x) to be a gauge paved the way for other applications, such as group-sparsity 
promotion [7]. 

An alternative to P(b, t) is the class of penalized formulations 



minimize p(b — Ax) + \<j>(x) 



where the nonnegative parameter A is used to control the tradeoff between the data 
misfit p and a convex regularization term </>. For example, talking p(r) = ||r|| 2 and 
4>(x) = \\x\\ yields a formulation analogous to (1.3). This penalized formulation is 



commonly used in applications of Bayesian parametric regression [24,25,30,34 36 
inference problems on dynamic linear systems [l 11 , feature selection, selective 



shrinkage, and compressed sensing Mp5][l9 , robust formulations [2 p7p8p3] , support 
vector regression 20 35 , classification 16 26 32 , and functional reconstruction [4 12 

H ■ 

From an algorithmic point of view, either formulation P(b,r) or P L (&, A) may be 
preferable. However, a key feature of P(b, r) that sets it apart from P L (6, A) and allows 
our comprehensive analysis is that its objective is a convex function of (6, r). This 
fact gives the convexity of the value function v(b, t) (see section [L3| . In contrast, the 
optimal value of the penalized formulation P L (b, A) is not in general a convex function 
of its parameters. The following example 



P(r) = k\\r\\l 



A=[l 1] , and b = 



illustrates this situation. The optimal value of p in the two formulations, as functions 
of t and A, respectively, are given by 



p(r T ) 



|(r-3) 2 for re [0,1) 

i + i(r-2) 2 for rG [1,3) 
otherwise; 



P(r\) 



for A G [0, 1) 
\\ 2 for AG [1,2) 
otherwise. 
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The optimal values and their derivatives are shown in Figure [TTTJ where it is clear that 
p(r T ) is convex (and in this case also smooth) in r, but p{r^) is not convex in A. 

The admissibility of variational analysis and convexity of the value function may 
convince some practitioners to explore formulations of type P(b, r) rather than P L (b, A). 
In fact, we give an example (in section [7]) of how this variational information might be 
used for algorithm design in the context of large-scale inverse problems. 

1.3. Approach. For many practical inverse problems, the formulation of primary 
interest is 

P R (6, cr) minimize 4>{x) subject to p(b — Ax) < a, 

X 

in part because estimates of a tolerance level c on a data fitting error p(b — Ax) are 
more easily available then estimates of a bound on the regularization penalty 4>{x). 
However, the formulation P(6, r) can sometimes be easier to solve. The underlying 
numerical theme is to develop methods for solving P R (b,a) using solutions to P(b, r). 

In section [2] we present an inverse function theorem for value functions that 
characterizes the relationship between P(b,r) and P R (b,a), and for more general 
nonconvex problems. One application of this result is to establish conditions under 
which it is possible to implement a root-finding approach for the nonlinear equation 

find r such that v(b, r) = a, (1-5) 

where P R (b,a) can be solved via a sequence of approximate solutions of P(b,r). 
This generalizes the approach used by van den Berg and Friedlandcr 6 , 8 for large- 
scale sparse optimization applications. The convex case is especially convenient, 
because both value functions are decreasing and convex. When the value function 
is differentiable, Newton's method is globally monotonic and locally quadratic. In 
section [5] we establish the variational properties (including conditions necessary for 
differentiability) of P(b,r). 

In section|4]we derive dual representations of P(b, r) and their optimality conditions. 
These are used in section [5] to characterize the variational properties of the value 
function v. The conjugate, horizon, and perspective functions arise naturally as part 
of the analysis, and we present a calculus (section [3]) for these functions that allows 
explicit computation of the subdifferential of v for large classes of misfit functions p 
and regularization functions (j) (see section [6]). 

One of the motivating problems for the general analysis and methods wc present 
is the treatment of a robust misfit function p (such as the popular Huber norm) in 
the context of sparsity promotion, which typically involves a nonsmooth regularizer 
<fi. We demonstrate (section [7]) how the sensitivity analysis can be applied to solve a 
sparse nonnegative denoising problem with robust misfit measures — both convex and 
nonconvex. 

The proofs of all of the results are relegated to the appendix (section [8]). 

1.4. Notation. For a matrix A £ M. mxn , the image and inverse image of the sets 
E and F, respectively, are given by the sets 

AE = { y I y = Ax, x e E} and A~ 1 F = {x \ Ax e F} . 

For a convex function p, its epigraph is denoted epi p, and the level set is denoted 
lev p (T) = { x j p(x) < t }. The function 8 (x \ X) is the indicator to a convex set X. 
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2. An inverse function theorem for optimal value functions. Let ip^ : 

X C R' 1 — > R, i g {1,2}, be arbitrary scalar-valued functions, and consider the 
following pair of related problems, and their associated value functions: 

v x (a) := inf ifj^x) + 5 ((x, a) | epi ip 2 ) , V 1,2(0) 

v 2 (t) := inf ip 2 (x) +6((x,t) | epi Vi) • Ps.iW 

This pair corresponds to the problems P I( (b,a) and P(b,r), defined in section [I] with 
the identifications 

V'i(aj) = p(b — Ax) and ip 2 (x) = (f>(x). 

Our goal in this section is to establish general conditions under which the value 
functions v 1 and v 2 satisfy the inverse-function relationship 

v 1 ov 2 = id, 



and for which the the pair of problems Vi 2 (cr) and V 2 i(t) have the same solution 
sets. The pair of problems P(b, r) and P R (b, a) always satisfy the conditions of the 
next theorem, which applies to functions that are not necessarily convex. 



Theorem 2.1. Let ^:ICt"->l,ig {1, 2}, be as defined in V 1>2 (a), and define 
S 12 := { a e R I ^ argmin-Pi :2 (CT) c{i£l ip 2 {x) = a } } . 

Let S 2 2 be defined symmetrically to S x 2 by interchanging the roles of the indices. 
Then, for every a € 

(a) v 2 (v 1 (a))—a, and 

(b) arg min V 1>2 (a) = argminP 2 ,i(wi(o-)) C {x e X \ %l) t (x) = v^a) }. 
Moreover, S 21 = { v 1 (cr) | a € S 12 }, and so 

{ (ff,Ui(o-)) I er G 5 lj2 } = { (v 2 (t),t) \ r G 5 2jl } . 



3. Convex analysis. In order to present the duality results of section |4j we 
require a few basic tools from convex analysis. There are many excellent references for 
the necessary background material in convex analysis, with several appearing within 



Wets 29 



the past 10 years. In this study we make use of Rockafellar 28 and Rockafellar and 
although similar results can be found elsewhere [9 10 22 . We review the 



necessary results here. 

3.1. Functional operations. The convex function h : 
following convex functions: 

1. Legendre-Fenchel conjugate of h: 

h*(y) ■= sup[(y, x) - h{x)} . 



generates the 



2. Horizon function of h: 



h°°(z):= sup [h(x + z) — h(x)] 

x£dom h 
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3. Perspective function of h: 



'A/i(A _1 z) if A>0, 
h(z, A) := { 6 (x | 0) if A = 0, 
-oo if A < 0. 



4. Closure of the perspective function of h: 





'Xh{X^z) 


if 


A > 0, 


h*(z,\) := < 


h°°(z) 


if 


A = 0, 






if 


A < 0. 



If h is additionally closed and proper, then so are h* (Theorem 12.2), K°° (Theorem 
8.5), and k* (p. 35, Corollaries 8.5.2 and 13.5.1), where the results are all from 
Rockafellar [28]. Note that these functions can also be defined by considering the 
epigraphical perspective and properties of convex sets. The conjugate function can 
be derived from the support function of the epigraph. The epigraph of the horizon 
function of h is the recession cone of the epigraph of h. The perspective function of h 
is the positively homogeneous function generated by the convex function h(x, X) := 
h(x) + 5 (A | {1}) [28j pp. 35 and 67]. 

Note that for every closed proper and convex function h, the associated horizon and 
perspective function, h°° and ft* , are positively homogeneous and so can be represented 



as the support functional for some convex set 28 Theorem 13.2]. Moreover, if h is a 
support functional, then h°° = h = h. 

3.2. Cones. We associate the following cones with a convex set C and a convex 
function h. 

1. Polar cone: The polar cone of C is denoted by C°: 

C° := {x* | (x*, x)<0 VieC}. 

2. Recession cone: The recession cone of C is denoted by C°°: 

C°° :={x\C + xcC} = {x\y + Ax€C VA > 0, Vy G C } . 

3. Barrier cone: The barrier cone of C is denoted by bar (C): 

bar (C) := { x* \ for some f3 e R, (x, x*) < /3 Vx e C } . 

4. Horizon cone: The horizon cone [28| Theorem 8.7] of h is denoted by 

hzn(ft) := { y \ h°°(y) < } = [lev ft (r)]°° V r > inf ft. 

3.3. Calculus rules. The conjugate, horizon, and perspective transformations 



in section 3.1 posses a rich calculus. We use this calculus to obtain explicit expressions 
for the functions p* , (f>* , (</>*)°° and {4>* Y that play a crucial role in the applications 
of section [6| The calculus for conjugates and horizons is developed in many refer- 
ences, including 128]. We extend these results to the perspective transformation in a 
straightforward way. Below we review three convex operations and give the associated 
calculus rules. 
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Affine composition. Let p : R m — > R be a closed proper and convex function, 
A e R" IX ™, and b € R m , such that (Ran (A) - 6) n ri (domp) ^ 0. Let 



Then 



where, for A = 0, 



h(x) := p(Ax - &). 

h°°(z)=p 00 (Az), 
h *(y)= i nl [(&>«)+£*(«)] 

A u—y 

h v (x, A) = p* (Ax — Xb, A), 
P*(Ab,0) =p°°(Aa;). 



All three functions are closed proper and convex. 

Inverse linear image. (Proof is in section [5]) Let p : R™ — > R be closed proper 
and convex, and let A G R mx ™. Define the inverse linear image of p under A to be 

h(w) = inf p(x). 

Ax—w 

Then h*(y) =p*(A T y), and if (A T ) -1 ri (domp*) ^ 0, then 

h°°(z) = inf p°° {x) and h n (w,X)= inf £^(2, A), 

Ax— z Ax—w 

where all of the functions h, h , and /i w are closed proper and convex. 

Addition. Let hi : R n —> R, for i = 1, . . . , m, be closed proper convex functions. 
If h := hi + • • • + is not identically +oo, then 

h°° = hj° + • • • + h m and h n = h\ + • • • + ft-m> 
where both are closed proper and convex. Moreover, 

I a 

if P| ri (dom hi) ^ 0, then h* = 7i*V • • • V/i*„ 

i=i 

is closed proper and convex, where V denotes infimal-convolution. 

Infimal convolution. Let h i : R" — s- R, for i = 1, . . . , m, be closed proper and 
convex functions. Set h = /i x V • • • Vh m . Then h = hi + • ■ • + h m , and 

m 

if p|ri(dom/i*) f 0, then h°° = h?V ■ ■ ■ Vh™, 

i=l 

and 

/^(x, A) = JLnf [hi(x x ,X) H h h* n (x m , A)] . 

All three functions are closed proper and convex. 

The results for both the horizon and conjugation transformations can be found 
in many references [9jj22, 28, 29]. Properties for the perspective transformation are 
less well documented. Since addition is a special case of affine composition, and 
infimal convolution is a special case of inverse linear image, we need only establish the 
perspective calculus formulas for affine composition and the inverse linear image. The 
affine-composition formula follows from 28, Theorem 9.5] and the definition of the 
perspective transformation, and the inverse linear image is established in section [HJ 
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4. The dual problem. For our analysis, it is convenient to consider the (equiv- 
alent) epigraphical formulation 



of P(b, r), where 



i»(6, r) = minimize f(x, b, r) 



f(x, 6, t) := p(b -Ax) +6 ({x, r) | epi 



(V) 



Because the functions p and <J> are convex, it immediately follows that / is also convex. 
This fact gives the convexity of the value function v, since it is the inf-projection of 



the objective function in x 28 Theorem 5.3] 



We make use the duality framework described by Rockafellar and Wets 29 Chapter 
11, Section H], and associate fP\ with its dual problem and its corresponding value 
function 



v(b, t) = maximize (b, u) + rp — f*(0, u, p). 



This dual problem is the key to understanding the variational behavior of the value 
function. To access these results we must compute the conjugate of /. For this it is 
useful to have an alternative representation for the support function of the epigraph, 
which is the conjugate of the indicator function appearing in /. 



4.1. Reduced dual problem. In Theorem 4.2 of this section, we derive an 
equivalent representation of the dual problem [P] in terms of u alone. This is the 
reduced dual problem for|^] We first present a result about conjugates for epigraphs 
and lower level sets. 



Lemma 4.1 (Conjugates for epigraphs and lower level sets). Let h : ' 
closed proper and convex. Then 

S*{(y,v) I epi h) = (h*) w (y, -p), 



5*(y\lev h (T))=cl(inf[Tp+(h*r(y,p)] 



be be 

(4.1) 
(4.2) 



Expressions (4.2) and (4.1) are established in 28 Theorem 13.5] and 28 Corollary 
13.5.1], respectively, where it is shown that (4.1) is a consequence of (4.2). In section[8] 
we provide a different proof where it is shown that (4.2) follows from (4.1). The 



arguments provided in this proof are instructive for later computations. 

The conjugate f*(y,u,p) of the perturbation function f{x,b,r) defined in[p|is 
now easily computed: 

/* (y, u, p) = sup [{y, x) + (u, b) + pr - p{b - Ax) -6((x,r) | epi <f>)] 
= sup [(y, x) + (u, w + Ax) + pr — p(w) — S ((x, r) | epi </>)] 



sup 



y + A u, x > + pr — 8 ((x, r) | epi 



sup [(u, w) — p(w )] 



( ( f>*V(y + A u,~p) + p*(u), 



(4.3) 



where the final equality follows from (4.1). With this representation of the conjugate 



of /, we obtain the following equivalent representations for the dual problem \D\ 
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Theorem 4.2 (Dual representations) . For problemfp\ define the functions 

g T {u) := p*(u) + 5*{A T u lev T (0) 



Pt(s,(J>) ■= ^7' + OT(s,m)- 
Then the value function for\D\ has the following equivalent characterizations: 



i)(b, t) = sup 



sup 



(b, u) — p*{u) — inf p T (A u, p) 

fj,>0 

(b, u) — p* (u) — S* (^A T u lev^r) 



= 9r(b) 

= d(v(;T))(b), 



(XV) 

(4.4a) 
(4.4b) 



where the closure operation in the fl4.4b ) refers to the lower semi-continuous hull of 



the convex function b H >• v(b, r). In particular, this implies the weak duality inequality 
v{b,T) < v(b,r). Moreover, if the function p is differentiable, the solution u to T> r is 
unique. 



One implication of these characterizati ons is that a solution (u, p) to the full 
dual YD can be obtained in two stages: solve T> r for u, then solve inf /J>0 p T (A T u, p) 
for /xTlndeed, in some cases there is a closed form expression for the solution. This 
is the case in the examples of section |6| This motivates the following result, which 
provides the basis for the extended Lagrange multiplier theory. 



Lemma 4.3. Let <f> be as in|p] with r > inf <p and x G lev^(r) D dorm 

1. For every s, we have 

5*(s I lev At)) < mip T (s,p)- 

/i>0 

2. Let (x, s) satisfy s £ N (x | lev^(r) ) and define 



S 1 = argmin p T (s,p) and S 2 = I p>0 

/x>0 



s G p + d<i>{x) 1 
- p(cf>(x) - r) J ' 



where, for x € dom ( 



p + d<f>{x) := 



{ pz \ z €z d<j){x) } if p > and x € domdcf), 



N (x I dom ( 



i//i = 0. 



J/ either Si or S 2 is non-empty, then Si = «S 2 an d equality holds in (4.5 1. 



(4.5) 



The second part of the definition for p + d(j)(x) is motivated by two observations. 
First, by 28 Theorem 8.2], for any non-empty closed convex set C we have lim„^o pC = 
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C°° . Second, it is easily shown that if dcj>(x) ^ 0, then d(j){x)°° = N (x | dom<^). 



Lemma 4.4 (Coercivity of primal and dual objectives). 

1. The objective function /(•, b, t) of\Pj is coercive if and only if 



hzn (0) n [-A l hzn (p)] = {0}. (4.6) 

2. The objective function of the reduced dual \D r \ is coercive if and only if 

b e hit (domp + Alev^(r)) . (4.7) 



5. Variational properties of the value function. With[D]and representation 
of the conjugate of the objective of [P] (cf. (4.3 1), we can specialize |29[ Theorem 11.39] 



to obtain a characterization of the subdifferential of the value function, as well as 
sufficient conditions for strong duality. 



Theorem 5.1 (Strong duality and subgradient of the value function). Let v and v 


be as in^P^and^D^ respectively. It is always the case that 


v(b, t) > v(b, t) 


(weak duality). 


If (b, t) £ int (domv), then 




v(b, t) = -0(6, t) 


(strong duality) 


and dv(b, r) 7^ with 




dv(b,T) :— argmax [(6, 


u ) - P*( u ) - Pt( aT u, -A*)] ■ 


it, /x>0 




Furthermore, for fixed (b, r) € M. m x M., 




dom/(-,6,r) ^0 


> be domp + A(lev^,(r)). 


In particular, this implies that 




(6, r) € int (domv) •<=>■ 


b e int (clomp + A(lev^(r))) . 



We now derive a characterization of the subdifferential dv(b,r) based on the 



solutions of the reduced dual V. 



Theorem 5.2 (Value function subdifferential). Suppose that 

b e ri (domp) + Ari (lev^(r)) (5.1a) 
ri (dom p*) n L4~ T ri (bar (lev (r)))] ^ 0. (5.1b) 
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1. If the pair (x, u) satisfies 

x G lev^(r), u € dp(b — Ax) and A T u G N (x I lev^r) ) 



(5.1c) 



then x solves\Pjand u solvesfD. 
2. Ifx solves^P^and (5.1a) holds, there exists u such that (x,u) satisfies (5.1c 



3. If u solves T> r and (5.1b) holds, there exists x such that (x,u) satisfies (5.1c). 



4-. If either (4.6) and (5.1a) holds, or (4.7) and (5.1b) holds, then dv(b,r) ^ and 

x 



argmin M>0 p T (A u, p) for all (x,u) G 



satisfying (5.1c) with 



dv(b, t) 




(x,u) gR" xR m satisfy ( |5.1c ) and 



M G argmin M > p T (A' r u~,fi) 

3x G lev^(r)s.t Oe -A T u+7I + 5(/)(x) 
where u G — Ax) , 
/I > 0, and ~p((j)(x) — r) = 



(5.1d) 
(5.1e) 



The representation (5.1e) expresses the elements of dv(b,r) in terms of classical 



Lagran ge m ultipliers when /i > 0, and extends the classical theory when /i = 0. (See 



Lemma 



4.3 



for the definition of /i d<f>(x).) Because i> is convex, it is subdifferentially 



regular, and so for fixed 6, we can obtain the subdifferential of v with respect to r 



alone 29 Corollary 10.11], i.e. 



d T v(b,r) = \ u 



G dv(b,r) 



6. Applications. In this section we apply the calculus rules of section 3.3 in 



conjunction with Theorem |5.2| to evaluate the subdifferential of the value function in 
three important special cases: where is a gauge-plus-indicator, a restricted quadratic 
penalty (RQP) function, and an affine composition with an RQP function. In all cases 
we allow p to be an arbitrary convex function. 

6.1. Gauge-plus-indicator. The case where p is a linear least-squares objective 
and ^ is a gauge function is studied in [8]. We generalize this case by allowing p to be 
a possibly non-smooth and non-finite-valued convex function, and take 



4>(x) := 7 (x\U)+5(x\X), 



(6.1) 



where U is a nonempty closed convex set containing the origin. Note that is a gauge 
if and only if X is a convex cone. 

Observe that the requirement that x G X is unaffected by varying t in the 
constraint 4>(x) < r. Indeed, the problem fP\ is unchanged if we replace p and <f> by 



p(y,x) :=p(y)+S(x \ X) 
with A and b replaced by 



and 



and A 



(x) ~j(x\U), 



A 
-I 



(6.2) 



Hence, the generalization of [8j discussed here only concerns the application to more 
general convex functions p. 
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There are two ways one can proceed with this application. One can use cj) as given 
in (6.1) or use p and <j) as defined in (6.2). We choose the former in order to highlight 



the presence of the abstract constraint But we emphasize — regardless of the 

formulation chosen — that the end result is the same. 



Lemma 6.1. Let cf> be as given in (|6.1[). The following formulas hold: 






<y(-\U)=6*(-\U°), 


(6.3a) 




dom 7 (• | U) = cone (U) = bar (U°) , 


(ft 1h) 




dom <t> = cone (U)nX, 


(6.3c) 




lev^(r) = (rJ7)nX, 


(6.3d) 




hzn(0) = U°° nl°°, and 


(6.3c) 




cl (bar (lev^(r))) = cl (bar ([/)) + cl (bar (X)) . 


(6 3fl 


If it is further assumed that 






ri (tU) n ri (X) f 0, 


(6.4) 


then we also have 






f(z) = min[S*(z-s\X) + S(s \ U°)], 

S V ' 


(6.5a) 


(<? 


*Y{z,p) = mm[5*(z-s\X) + 6 (s \ fiU°)], 

S 


(6.5b) 


8*{z 


lev </,( T )) = ™n[rp +((/)*)* (z,p)} 


(6.5c) 




= min[5*(« — s 1 X) + T"f (s 1 U°)], and 

s 


(6.5d) 


N (x 


lev^(r)) =N(x\tU) + N(x\X). 


(6.5e) 


Ifs minimizes (| 


5.5d| ; then Ji := 7 (s | U") minimizes (6.5c). 




By Theorem 


5.1 the subdifferential of v(b,r) is obtained by solving 


the dual 



problem (|8.4|) or the reduced dual V r When cf> is given by (|6.1|), the results of Lemma 

(6.6) 



sup 



6.1[ show that the dual and the reduced dual take the form 
(b, u) + t(jl - (<p*) 7T (A T u, -fi) - p*(u) 
- sup {b, u) — p* (u) — 5* ( A T u lev^(r) J 
= sup (6, u) — p* (u) — min[<5* (^A T u — s X ^ + tj (s | U°)] 
(6, u) - P *{u)-5*(a t u-s x) -S*(s\tU) 



sup 

u.s 



(6.7) 



Moreover, if (it, s) solves ( |6.7[ ), then (u,p) solve (6.6) with p — —7 (s | C/°), and 

(u,-^(s\U°))edv(b,r). 



We have the following version of Theorem 



5.2 



when (j) is given by (6.1) 
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Theorem 6.2. Let </> be given by (6.1) under the assumption that (6.4) holds, and 



consider the following two conditions: 

b g ri (dom p + A[tU n X]) = ri(domp) + A[n(rU) Dri(X)] (6.. 



3 u g ri (dom p*) suc/i i/iai A T u € ri (bar ([/)) + ri (bar (X)) . 



If the triple (x, it, s) satisfies 



(6.9) 



uedp(b-Ax), x e X D(tU), 
s£N(x\tII), and A T u — s <G N (x\X) , 



then x solves P(b, r) and (u, s) solves (6.7). 



satisfies (6.10) 



i?. If x solves P(b,r) and (6.8) holds, then there exists a pair (u,s) such that (x,u,s) 



fies (6.10) 



3. If(u,s) solves (6.7) and (6.9) holds, then there exists x such that (x,u,s) satis- 



4- If either 



U°° n X°° n [-A _1 hzn (p)] = {0} and dfLSj /to/ds, 



or 



b g int (domp + A[rt/ D X]) and ( 6^9| holds, 
then dv(b, r) 7^ and is given by 



(6.11) 



(6.12) 



dv(b, t) 



- 7 {s\U°) 



(x,u,s) efxTxl" satisfy (6.10) ^ (6.13) 



]i£ XD(tU) s.t. 



g - A 1 u + N (x I X ) + p + c>7 (x I U) w/iere 
w g dp(b - Ax), < p and p(7 (x | J7) - r) = 



6.1.1. Gauge penalties. In (8), the authors study the case where p is a linear 
least-squares objective, <j> is a gauge functional, and X = K ra . In this case, [8j Lemma 
2.1] and [8j Corollary 2.2(b)] can be deduced from (6.7) and (6.13), respectively. 
Another application is to the case where p is finite-valued and smooth, (f> is a norm, 
and X is a generalized box. In this case, all of the conditions of Theorems 5.1 and 



6.2 are satisfied, solutions to both P{b,r) and (6.7) exist, and v is differentiable. In 



particular, consider the non-negative 1-norm-constrained inversion, where 



<j>(x) = \\x\\ 1 + 6(x\Rl) , 
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and p is any differentiable convex function. The subdifferential characterization given 
in Theorem 5.1 can be explicitly computed via Theorem 6.2 In the notation of (6.1 ), 

u = {x\ |ML<i} , 



the dual ball B x , and X in (6.1) is R + . Since the function p is differentiable, the 



solution u to the dual (6.7) is unique [281 Theorem 26.3]. Therefore, Theorem 6.2 
gives the existence of a unique gradient 

V b v{b, t) = -A T Vp{b - Ax), 

where x is any solution that achieves the optimal value. The derivative with respect 
to r is immediately given by Theorem |6.2| as 

V r v(b,T) = -l(A T Vp(b-Ax) | U°) =-\\A T Vp(b-Ax)\\ 0o . (6.14) 

6.2. Restricted quadratic penalty functions. We now consider the case 



4>(x) := sup [(x, w) — s (w, Bw) 



(6.15) 



where U C K" is nonempty closed and convex with G U, and B G K™ xn is positive 
semi-definite. We call this class of functions restricted quadratic penalty (RQP) 
functions since their conjugates are given by 

cf,*(w) = \ (w, Bw) +5{w | U) . (6.16) 

If the set U is polyhedral convex, then the function 4> is called a piecewise linear- 
quadratic penalty function 29 Example 11.18]. Since B is positive semi-definite there 



is a matrix L € 



nnxk 



such that B = LL where k is the rank of B. Using L, the 



calculus rules in section 3.3 give the following alternative representation for 



4>{x) = sup [(w, x) - ^\\L w\\ 2 -S(w\ U)] 



inf 

Xi -\-Xo=X 



S*(x 1 \U)+ inf 

hs—x. 



= inf 

5 

= inf 



|||s||2 + <f (x-Ls|£/ 
\\\ S \\l +1 (x-L S \U°) 



(6.17) 



where the final equality follows from [28| Theorem 14.5] since G U. Note that the 
function class (6.15) includes all gauge functionals for sets containing the origin. By 
(6.16), it easily follows that 

{(j>*f{w,p) ■ 

where || • \\ B denotes the seminorm induced by B, i.e 



±\\wf B +S(w\(iU) if M >0, 
S(w | U°° DNul(S)) if/i = 0, 
+oo if p < 0, 



w\ 



\l w T Bw 



The next result catalogues important properties of the function <fr given in (6.15). 



VARIATIONAL PROPERTIES OF VALUE FUNCTIONS 



15 



Lemma 6.3. Let <fi be given by ( 6.15[ ) with r > 0. Then 



dom <fi = cone (U) + Ran (B) and 
hzn (</>) = cone (U)° , 

in particular, </> is coercive if and only if0£ int ({/). Moreover, 

5* (w | lev^(r) ) = mm [tX + (cf>*)^(w, A)] 

_ { T'f (w \ U) + 2 \(w\U) 

if 7 (w\U)>\\w\\ B /V2r, 
"\V2^\\w\\ B if 1 {w\U)<\\w\\ B /V*?, 



where the minimizing A in (6.18) is given by 



A = max < 7 (w | U) 



(6.18) 
(6.19) 

(6.20) 



In particular, the formula ( |6.19[ ) implies that 

bar (lev^r)) = dom (<5*(- | lev^(r))) = dom(7 (• | U)) = cone (£/) . 



We now apply Theorem 5.2 to the case where 4> is given by (6.151. 



Theorem 6.4. Let <fi be given by (6.15), and consider the following two conditions: 



3 x £ ri (dom</>) such that 4>{x) < r and b — Ax £ ri (domp) (6-21) 



and 



3 u £ ri (dom p*) suc/i that A T u £ ri (cone (U)) . 

1. If the pair (x,u) satisfy 

x £ lev^r), u £ dp(b — Ax) and A T u £ N (x \ lev^(r)) , 
thenx solves P(b,r) andu solves\D^\ 



(6.22) 



(6.23) 



2. If x solves P(b,r) and (6.21) holds, then there exists u such that (6.23) holds 

3. Ifu solves T> r and ( |6.22 1 holds, then there exists x such that (6.23) holds. 
4- If either 

cone (U)° n [-A~\zn (p)] = {0} and KTil holds, 



or 



b £ int (domp + Alev^(r)) and (6.22) holds, 



(6.24) 



(6.25) 
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Fig. 6.1. Huber (left) and Vapnik (right) penalties 



then dv(b,T) 7^ and is given by 



dv(b, t) 



3 x s.t. (x,u) satisfy (6.23) and 



p = max ^7 i^A T u | C/j , ||A T u|| B /v / 2r| j 

]i£ lev^r) s.t. € —A T u + p + d(j>(x) where 
u e dp(b — Ax), < ~p and p((f>(x) — r) = 



(6.26) 



In the following corollary we exploit the structure of <f> to refine the multiplier 



description of the dv{b,r) given in (6.261 



Corollary 6.5. Consider the problem P(6, t) with <j) given by (6.15|. A pair (x,u 



satisfies (6.23) if and only if x € lev^r), u 6 dp(b — Ax), and either 



A u G N (x I dom (f>) , or 



.27a) 



3 p > 0, w eU such that x € Bw + N (w\U) and A T u = pw. (6.27b) 



6.2.1. Huber penalty. A popular function in the PLQ class is the Huber 
penalty [23) : 



ct>{x) 



- sup 



[x, w) 



l\\w 



The Huber function is of form (|6.15|), with B = I and U 



n\x i 

= [~ 
hold. 



if < k, 
j — k 2 /2, otherwise. 

k,k]. In this case, 



U n Nul (B) — {0} so that the conditions of Corollary 6.5 

A graph of the scalar component function <f> i is shown in Figure |6.1[ The Huber 
penalty is robust to outliers, since it increases linearly rather than quadratically. For 
any misfit function p, Theorem 6.4 can be used to easily compute the subgradient 
dv(b, t) of the value function. If the regularity condition (6.21 ) is satisfied (e.g., if p is 
finite valued), then Theorem 6.4 implies that 



dv(b, t) 




p 



(x,u) satisfy ( |6.23 l and 
oiiix { /< A' it % , ||A T u|| 2 /v / 2r| 



In particular, if p is differentiable finite- valued, u = Vp(6 — Ax) is unique and 

Vv(b,r) = 
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6.3. Affine composition with RQP functions. Next consider the case where 
takes the form 



(x) := ifj(Hx + c), where ip(y) '■= sup[(y, w) — h (w, Bw)], 



(6.28) 



H G M l ' x ™ is injective, c € K.", U C M." is nonempty closed and convex with G U 
and -B G R" " is symmetric and positive semi-dchnite. We assume that 

3 x such that Hx + c G ri (dom ip) , 

where dom?/> = cone (U) + Ran (B) (Lemma 



(6.28 1 is an instance of the piecewise linear quae 



6.3). We show that the function <f> in 



ratic function considered in section 6.2 



To see this we make the following definitions 

'v 



A 
-H I 



, B = 




B 



> Pi *)=P(l/) + *(*l{0}), and 



: SUP 

CM 



V \ IX 

w j ' U 



=r(x|{o>) + ^( S ). 



With these definitions, the two problems P(6, r) and 

minimize p(6 — Ax) subject to 4>(x) < r 
are equivalent. In addition, we have the relationships 

P* (f\ = P («) + <5* (r I {0} ) , 4>* ("') =S(v\ {0}) + r (w), 



-r\\ w ]\ U ) = S(v\{0})+~/(w\U), and 



£>|{0}) + MI 



Moreover, the reduced dual V r becomes 



sup [{b,u)+{c,r)-p*(u)-S*(r\ley il (T))]. (6.29) 

H T r=A T u 

Using standard methods of convex analysis, we obtain the following result as a direct 



consequence of Theorem 6.4 and 29 CorollarylO.il]. 
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Theorem 6.6. Let <f> be given by (6.28), and consider the following two conditions: 



3 x such that Hx + c £ ri (dom ip) , ip(Hx + c) < t, and b — Ax £ ri (dom p) (6.30) 
and 



3 u £ ri (dom p ) and f £ ri (cone (U)) such that [ „ J £ Nul 



A 

-H 



(6.31) 



1. If the triple (x,u,r) satisfies 
x £ lev^(r), w £ <9p(o - Ax), f £ N (Hx + c | lev^(r) ) and A T m = # T r , (6.32) 



i/ien x solves P(b, r) and (u, r) solves (|6.29 ) 



2. 7/x solves P(b,r) and (6.30) holds, there exists (u,r) such that (6.32) /io/ds 



5. If (u,r) solves (6.29) and (6.31) holds, there exists x such that (6.32) holds. 
4- If either 



J ff _1 [cone([/) ] n [-A _1 hzn(p)] = {0} and KM holds, 



or 



£ int dom p x lev,/, (r) + Ran 



A 

-H 



and (6.31 ) holds, 



then dv(b,r) ^ and is given by 
dv(b, c, r) = 



u 
f 



3 x £ R™ s.£. (x, u, r) satisfy ( 6.32| and 
p = max{ 7 (r | U) ,\\r\\ B /75r} 



u 
f 



3 x £ R n s.t c + Hx £ lev^(r), 
u e dp(o — Ax), f G ~p + d'il'(c + Hx), ~p, > 0, 
pz(ij)(c + Hx) — t) = 0, and A T u = _ff T r 



Corollary 6.7. Consider the problem P(b,r) with <fi given by (6.28). Then(x,u,r) 
satisfies (6.32) if and only if 



H T f, 



Hx + c £ lev^ (t) , u£dp(b — Ax), A u 

and either f £ N (Hx + c | dom "0 ) , or 
3 /Z > 0, w £ U such that Hx Ice Bl + TV (uJ | £7) and r = pw. 
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6.3.1. Vapnik penalty. The Vapnik penalty 



p(r) 



sup 

'-^[0,1] 2 



{( 


r — e 






—r — e 





(r-e) + + (- 



is an important example in the extended PLQ class which cannot represent using the 
PLQ class of the previous section. The scalar version is shown in the right panel of 
Figure |6.1| In this case, 



H = 



' I ' 




el 


-I 


> c = - 


el 



B = 0eR 2nx2n , and U = [0, if" 



In order to satisfy (6.32 1, we need to find a triple (x, u, w) with w = [wj w 2 ] T G [0, l] 2 ™ 



so that u G <9p(6 — Ax) and A T u = i? T u> = w\ — tu 2 - We claim that either w^i) = 
or w 2 (i) = for all i. To see this, observe that w <E N (i?x + c I lev^, (t) ) , so 



w,y 



< 



whenever ip(y) < r. Taking y first with — e as the only non-zero in the ith coordinate, 
and then with — e in the only nonzero in the (n + i)th coordinate, we get 

©!(«)(-£(«)) < and uJ 2 (i)(x(i)) < 0. 

If x(z) < 0, from the first equation we get w 1 (i) — 0, while if x{i) > 0, we get w 2 (i) = 
from the second equation. If x(i) — 0, then taking y — gives 

Wi(i)e < and w 2 (i)e < 0, 

so Wi(i) = w 2 (i) = 0. Since A T u = Wi — w 2 , and Wi(i) or w 2 (i) is for each i, we get 



/i = 7 | [0, 1] 2 ™J = ulloo- Hence, the subdifferential 9u is computed in precisely 
the same way for the Vapnik regularization as for the 1-norm. 



7. Numerical example: robust nonnegative BPDN. In this example, we 
recover a non-negative undersampled sparse signal from a set of very noisy measure- 
ments using several formulations of [P] We compare the performance of three different 
penalty functions p: least-squares, Huber (see section [6.2.1 1, and a nonconvex penalty 
arising from the Student's t distribution (see e.g., [3]). The regularizing function cf> in 
all of the examples is the sum of the 1-norm and the indicator of the positive orthant 



(see section 6.1.1 1 



The formulations using Huber or Student's t misfits are robust alternatives to the 



nonnegative basis pursuit problem 13 . The Huber misfit agrees with the quadratic 
penalty for small residuals, but is relatively insensitive to larger residuals. The 
Student's t misfit is the negative likelihood of the Student's t distribution, 



p s (x) = log(l + x 2 /v), 



(7.1) 



where v is the degrees of freedom parameter. 

For each penalty p, our aim is to solve the problem 



minimize 

x>0 



subject to p(b — Ax) < a, 
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Fig. 7.1. Left, top to bottom: True signal, and reconstructions via least-squares, Huber, and 
Student's t. Right, top to bottom: true errors, and least-squares, Huber, and Student's t residuals. 



via a series of approximate solution of [P] The 1-norm regularizer on x encourages a 
sparse solution. In particular, we solve the nonlinear equation (1.5), where v is the 



value function of [p] This is the approach used by the SPGL1 software package [8] 
the underlying theory, however, does not cover the Huber function. Also, 4> is not 
everywhere finite valued, which violates [8[ Assumption 3.1]. Finally, the Student's t 
misfit (7.1) is nonconvex; however, the inverse function relationship (cf. Theorem 2.1) 



still holds, so we can achieve our goal, provided we can solve the root-finding problem. 



Formula (6.14) computes the derivative of the value function associated with 



P(b, t) for any convex differentiable p. The derivative requires Vp, evaluated at the 
optimal residual associated with P(b,r). For the Huber case, this is given by 

(Vp(6 - Ax))i = sign(6, i - A^x) ■ min^ - A^x], k). 



The Student's t misfit is also smooth, but nonconvex. Therefore, the formula (6.14) 
may still be applied — with the caveat that there is no guarantee of success. However, 



in all of the numerical experiments, we are able to find the root of (1.5) 



We consider a common compressive sensing example: we want to recover a 20- 



sparse vector in 



d512 



from 120 measurements. We use a Gaussian measurement matrix 



by which we mean that each entry is sampled from the distribution 



AgR 100xl024 

N (0, jq). While measurements to test the BPDN formulation are typically generated 
by contaminating Ax with small Gaussian noise , we generate them according to 



b = Ax 



c, 



where w ~ N(0, a 2 ) with a — 0.005 matches common practice, and ( describes a small 
randomly placed set of 5 outliers, sampled from iV(0,4). For each penalty p, the a 
parameter is the true measure of the error in that penalty, i.e., a p = p(C)- This allows 
a fair comparison between the penalties. 

We expect the Huber function to out-perform the least squares penalty by bud- 
geting the error level a to allow a few large outliers, which will never happen with 
the quadratic. We expect the Student's t penalty to work even better, because it 
is non convex, and the grows sublinearly as outliers increase. The results in Fig- 
ure |7.1| demonstrate that this is indeed the case. In many instances the Huber 
function is able to do just as well as the Student's t; however, often the Student's 
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t does better (and never worse). Both robust penalties always do better than the 
least squares fit. The code is implemented in SPGL1, and can be downloaded from 
https://github.com/saravkin/spgll. The particular experiment presented here 
can be found in tests/spgllTestNN.m. 

8. Appendix: Proofs of results. 

Proof of Theorem |2.1[ Let a £ S 12 and set r a — v 1 (a). By assumption, 
argminVi 2 (cr) 7^ 0- Let x a £ argmin'Pi 2 (cr), so that tp^x^) — r a and ^ 2 (x a ) = <J - 
In particular, x a is feasible for V 2 i( r o-)- Let x be any other feasible point for V 2 \{ T o) 
so that ipi(x) < r a = Wi(cr) = i[)i(x a ). If ipi{x) < r a = v 1 (a), then ip 2 (%) > a 
since otherwise we contradict the definition of v 1 (cr). If ipi{x) = t ct , then we claim 
that ip 2 (^) > a - Indeed, if i/> 2 (x) < a i then x £ argmin'P 1 2 (cr) but ip 2 {x) < a, 
which contradicts the fact that a £ <Si2- Hence, every feasible point for P 2 i(v) 
has ■i/ , 2(£) > with equality only if i()\(x) — r a . But x CT is feasible for V 2 with 
V^OO = 17 • Therefore, x CT € argrrrin'^ ^Tg.) c{i£l 0i(^) = T CT }■ Consequently, 
«2( u i( (7 )) = 17 an( i 

^ argminP li2 (o-) C argminP 2il (r ff ) C { x £ X \ ^(x) = r a } . (8.1) 

We now show that argminT-^ i( T o-) C argmin'Pi 2 ( cr )- Let £ € argminT^ i(t ct ). 
In particular, x is feasible for V 2 \ so, by what we have already shown, ip 2 (x) ^ 17 
with equality only if ipi(x) = T a . But, by our choice of x, ip 2 (x) = v 2( v i( a )) — so 
^i(^) = r a , i.e., x £ argmin^ 2 (cr). 

It remains to establish the final statement of the theorem. By (8.1), we already 
have that { v 1 (cr) | a £ 5 12 } C <S 2l i, so we need only establish the reverse inclusion. 
For this, let r £ S 21 and set cr T = u 2 ( T )- By interchanging the indices and applying 
the first part of the theorem, we have from (8.1 1 that 

^ argmin V 2>1 {t) C argmin7\ 2 (ov) C { x £ X \ ip 2 (x) = <r T } . 

That is, cr T £ S 12 and, by (a), r = «i(« 2 (t)) = Vi(<7 T ). 



Proof of the inverse linear image (section 3.3). For A > 0, observe that 
h?{w, A) = A inf p(x) 

Ax— A 

= A inf p(A -1 s) 
= inf p*(s, A) 

inf <! ^(sjC) 



(s := Ax) 



.4 



(8.2) 
(8.3) 



where 



.4 



.4 




Again by 28 Theorem 9.2] in conjunction with |28l C orollary 16.2.1], the function in 
3) is closed if (A T ) _1 dom(p' r )* 7^ 0. Since, by[28[ Corollary 13.5.1], dom(p ,r )* = 



{ (u, 77) J p*(u) < —rj }, we have 



(A T )~ 1 dom^)* ^ 



if and only if {A T ) 1 domp* 7^ 



Hence, by assumption, the function in (8.3 1 is closed proper and convex and equals 



h v (w,\) on the relative interior of its domain. Since h?{w,\) is closed, (8.2 1 implies 
that these functions must coincide. 
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Proof of Lemma 4.1 , We first prove (4.1 ). The conjugate of 6 ((x, r) | epi ft) is 
obtained as follows: 

6*{(y,v) I e Pi h) = sup[(y, x) + p,r - S ((x,t) | epi ft)] 

= sup [{y, x) + p,r - 8{h{x) - r | K_)] 

= sup [ (y, x) + fi(h(x) — to) — S (lj I E_) 1 (w := h(x) — r) 

= sup [ (y, x) + fih(x) + sup [— //a; — J (a; | R_)] ] 

a; w 

= sup [(y, a;) + p,h(x) + 5(p \ R_)] . 

For /x < 0, we obtain 

S *((y>V) I e Pi M = -A* SU P (-^V-, - h{x) = - p,h* {- pT l y) . 

Since ft is necessarily a closed proper convex function, we obtain the result. 
To see (4.2), first note that the function 

q(y) := inf '[rp, + ph*{y/p)\ = inf Jfr/x + {h*) n (y, p)\ 

fJ.>0 ft>0 

is the positively homogeneous function generated by the function y \— > r + h(y) [28[ page 
35], and so is convex in y. Next observe that the conjugate of q is given by 



q* (x) = sup 



(i, y) - inf [[r(i + {h*Y{y,p)} 
/n>0 



= sup [(x, y) + r(-/x) - (h*Y(y,p)] 
y,n>o 

— sup [{x, y) + rp — (ft*) 71 (y, —p,)] (exchange — // for /i) 

= sup y) + r/i — <T((y, /i) | epi ft)] (by |4T| )) 
(j/.m) 

= 5 ((a:, r) | epi ft) = <S (a: | lev (r)) . 



The result now follows from the Bi-Conjugate Theorem 28 Theorem 12.2]. 



Proof of Theorem 4.2, Combining \V\ with (4.2) and (4.3) gives 
v(b, t) := sup (b, u) + rp — (<p*Y (A T u, —p) — p*(u) 



= sup 

u 

= sup 

u 

= sup 



(b, u) - p*(u) - inf r(-/i) + (0*) 7r (A u, -/x) 

^i<0 L 



(6, «) - p*(u) - inf r(/i) + (<j>*)*(A u,p) 

fi>0 L 

(6, u) — p* (u) — S* (^A T u lev^(r) 



(8.4) 



where the final equality follows from (4.2). The equivalence (4.4a|) follows from the 
definition of the conjugate and the equivalence (4.4b) follows from 28 Theorems 16.3 
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and 16.41 which tell us that 



g*(b) = cl (pv [5*(A T ■ | lev T (<£) )]*)(&) 



cl [ inf 



+ inf <5(z| lev T ((/>)) 

Arc— iu 9 



(&) 



= cl inf p(- - Ax) (b) 

\<p(x)<T J 

= d(«(.,r))(6). 

The uniqueness of m when p is differentiable follows from the essential strict convexity 
of p* [28) Theorem 26.3]. 

Proof of Lemma 



Part 1. The inequality follows immediately from (4.2). But it is also easily derived 



from the observation that if \i > and x £ lev^r), then 

r/i + /i(/)*(s//x) > r/i + /J<[(x, s/fi) — 4>{x)] [Fenchel- Young inequality] 
> (j){x)n + (x, s) — ^4>(x) 
= (x, s) . 

Taking the sup over x £ levj,(r) gives the result. 



Part 2. A key fact 28 Theorems 23.5 and 23.7] used repeatedly throughout our 
proof is that for any non-empty closed convex set U and u £ U, we have 



v£N(u\U) <= 
Also, from the Fenchel- Young inequality. 



u £ 85 (v\U) = arg max (v, u) . 



(8.5) 



t + (/>* (I) > &(x) + (f>* (s) > (s, x) . (8.6) 

We divide this part of the proof into two parts: (i) if Si ^ 0, show S\ C S%, and 
(ii) if 7^ 0, show S 2 C S*i and equality holds in (4.5). Combined, these implications 
establish Part 2 of the lemma. 

(i) Let ~p £ Si. We show that ~p £ S 2 - Let us first consider the case where (f>(x) < r. 
We begin by showing that T (x \ lev^r)) =T(x \ dom <j>). Clearly, T (x \ lev^r)) C 
T(x\ dom^). To obtain the reverse inclusion, 29 Theorem 6.9] tells us that we need 
only show 

|J A(dom0 - x) C (J A(lev (r) - x). 



A>0 



A>0 



Let A > and a; € dom 0. We need to show that there is a a > and z € lev^(r) such 
that cr(z— x) — X(x—x). Observe that <j)(x+r](x—x)) < (l — r])(f)(x)+r](j)(x) < (1 — tj)t+ 
r/(j)(x) for all < r) < 1. Hence, there is an 77 > such that z = X + fj(x — x) e IsvAt) 
(77 = 1 if x £ lev^r); otherwise 77 = (r — <j)(x)) / (<f)(x) — 4>(x)) £ (0, 1)). Setting ct = A/77 
gives cr(z — x) = afj(x — x) = X(x — x). Therefore, T (x | lev^r)) = T(S~| dom</>), 
and consequently N (x | lev^r) ) =iV(x| dom0). Hence, by (8.5 ), s £ N (x | domc/i). 
Therefore, if ~p — 0, we have ~p £ S 2 - On the other hand, if ~p > 0, by (8.5) and the 
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fact that N (x | lev^r)) = N (x | dom^), we have 

(s, x) = S*(s\ dom (j) ) 

= 0*)°°(s) (28j Theorem 13.3] 

= rO + (^r(s,0) 

> T/Z + (</>*) 7r (s,^) 

> Jl4>(x) + /Z[(s//Z, x) — 0(5;)] [Fenchel- Young inequality] 
= (s, ■ 

Since this cannot occur, it must be the case that JL = and JI € S 2 - 

Now suppose that </>(x) = t, and let us first consider the case where s — 0. Then, 



for [i > 0,p r (s,/i) = (r + 0*(O))^ > Oby @, and, for // = 0,p T (s,fi) = (<j> )°°(0) = 0. 
Therefore, = inf < M p T (s, /x) with /i = G 5^. But, in this case, it is also clear 
£ S 2 / i, since s = G AT (x | dom>). Thus, if /I = we have ju G S 2 - If 



p > 0, then = t + 4> (0) since = p T (0,/Z) = (r + (0))/Z. But then, by ( jg-g] ), 
0(x) + 0*(O) = (s, x) = so that s = 0e d(j>(x). However, </>(x) = r > inf 0, so 
^ d(f>(x) [28| Theorem 23.5(b)]. This contradiction implies that if s = 0, then we 
must also have ~p = 0, and, in particular, we have Si C 5 2 . 

Let us now consider the case where <fi(x) = r and 3 / 0. By (28) Theorem 23.7], 

N (x | lev (r) ) = cl (cone (d<f>(x))) , (8.7) 

and, by |29l Proposition 8.12], 

d<p(x)°° = N{x\ domcp). (8.8) 



By (8.5) and (8.7), s G cl (cone (d<fc(x))). That is, there exists {v } C d(j>(x) and 
{/Xfe} C such that [i k v k — > s. Because d<p(x) is closed, we can assume with no loss 
in generality that either (a) there exist v G d(f>(x) such that v k — > v, or (b) \\v k \\ j" oo 
and there exits u ^ such that v" /\\v || — > u. Since s^0, (a) implies that there is a 



/i 7^ such that (j, k [i and s = /j,v € cone (9</>(x)), while (b) and (8.8) implies that 
s G A" (x | dom ) . In summary, we have 

either (a) s G cone (d<j)(x)) or (b) s £ N (x\ dom ) . (8.9) 



Using (8.9), let us first suppose that s <f N(x\ dom</)) so, in particular, s G 
cone (d(j)(x)). As an immediate consequence, we have that S 2 ^ and the only values 
of fi for which s G fi + d<fi(x) have /x > since s ^ 2V(a:| dom0). Let < /i £ S 2 . If 
~p = 0, then 

** (5 | dom</>) = 

< r/t + fi(j)*(s/p,) 

= ft(j>(x) + ft[(s/fl, x) — 4>(x)] [Fenchel- Young inequality] 
= (s, x) 

< <5 (s | dom ) 
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so that (s, x) = S*(s\ dom0), or equivalently, s e JV(i dom</>) contradicting the 
choice of s. Hence, it must be the case that ~p > 0. Again let < fi 6 S 2 . Then, by 
Part 1, 

6*(s | lev^(r)) < p T (s,Jl) 

= n inf Pt{s, M) 

0</x 

< Tp, + fl(f>* (S I fl) 

= fi(f)(x) + /t[(s//t, x) — <fi(x)] [Fenchel- Young inequality] 

= (s, x) 

<S*(s\\ev^T)) 

so that (s, x) = ~p[(f>(x) + (f)*(s/jl)], or equivalently, s g ~pd(f)(x). Hence, ~p € S l 2 . 
Finally, consider the case where 0^s€ TV (x | dom (j) ) . Then 

inf p T (s,n) < p T (s,Q) 

M>0 



= 5 (s | dom 4> ) 

= (s, x) 

= S*(s\ lev^(r)) 
< inf p T (s,n) , 

/j>0 



(28) Theorem 13.3] 



[by (8.5)] 



[again by (8.5 1 
[Part 1] 



so € S 1 ! and £ S 2 . If JX > 0, then this string of equivalences also implies that 
(s, x) = p T (s,Jl) = Jl[4>(x) + cj>*(s/~p)], or equivalently, s £ ~pd(f)(x) so that JL 6 5 2 . 
Putting this all together, we get that S[ C S 2 - 
(ii) Let /Z € S , 2 . If /Z = 0, then 

p T (s,o) = («n°°(<o 



= <5* (s | dom (j>) 
= (s, x) 

<**(*|lev*(r)) 
< inf p T (s,n). 



28, Theorem 13.3] 



[Part 1] 



Therefore, fi = € S 1 and equality holds in (4.5) 



On the other hand, if fi > 0, then s/fi £ d<j){x), and so 

[Fenchel- Young inequality] 



Tp l + {4>*Y{s,p)='p, [<p(x) + 4>* (s/fi)} 
= p,(x, s/fi) 
= (x, s) 

<<f(s|lev (r)) 

< inf [tm+^TC^M)] 



[Part 1] 



Hence, (J, & Si and equality holds in (4.5) 
Proof of Lemma 14.41 
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Part 1. The primal coercivity equivalence follows from 28 Theorems 8.4 and 8.7] 
since hzn (/(•,&, t)) = hzn (</>) n [— A - hzn (p)]. 

Part 2. For the dual coercivity equivalence, let g(u) = g T (u) — (b, u), which is 
the objective of the reduced dual V r By ( 4.4b[ ), g*(0) = g*(b) = cl(v(-,r)) < v(b,r). 
Therefore, the result follows from 28 Corollary 14.2.2] since by ( |8. 10 ), dom u(-,t) = 
dom p + A dom <f). 



Proof of Theorem |5.1[ The expression for / is derived in (4.3). The weak 
and strong duality relationships as well as the expression for dv follow immediately 
from [29) Theorem 11.39]. 

Next, note that 

dom/(-,6,r)^0 <= 



3x £ lev^(r) 
b — Ax £ dom p 



b£domp + A\ev <t> (T). (8.10) 
Theorem 6.6 and 
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Now assume that b £ int (dom p + A(lev ^(t))) . Recall from 
Corollary 6.6.2] that 

int (domp + A(lev (r))) = ri (domp) + A(ri (lev^(r))). (8.11) 

Moreover, by [28| Lemma 7.3], for any convex function p, 

ri (epi p) = { (ar, p) | x £ ri (domp) and p(x) < p } . 

Since lev p (r) = P(epi pD { {x,p) | p < r }), where P is the projection (x, p) i-> x, 
Theorems 6.5 and 6.6] tells us that 

ri (leVp(r)J = P(ri (epi p) n ri ({ (x, p) \ p < t })) = { x £ ri (domp) | p(x) < r } . 

(8.12) 

Since b £ int (dom p + A(lev^,(r))) , (8.11 )-( 8.12 1 imply the existence of w £ ri (dom p) 
and x £ ri (dom (j)) with 4>(x) < t such that b = w + Ax. Since <j) is relatively continuous 
on the relative interior of its domain [28], Theorem 10.1], there exists S > such that 

(w + SIB) n dom p C ri (dom p) , 
(x + SIB) n dom C ri (dom <fi) , 
<j){x) < \ (<j)(x) +t)Vx £ (x + SB) n dom cf>. 

Set S p = (w + SB) n domp and 5y = (x + SB) n dom</>. Since 



cone (S p + ASfi — o)= cone (S p 



Acone (S^ 



= span (dom p — w) + A span (dom <f> — x) 

= span (dom p + A dom <j) — b) 

D cone (dom p + A dom <f> — b) 

= R m (b £ int (domp + A(lev^(r)))) , 

we have £ int (5 p + AS^ — b) . Therefore, there exits an e > such that b + eB c 
S p + AS^. Consequently, if b £ b+ eB and |f — r| < |(^>(x)+r), then dom/(-, 6, f) 7^ 
and so (6, f ) € domu. 

On the other hand, if (6, t) £ int (domo), then dom /(•, b, f) 5^ for all (6, f ) near 
(6, t) so that dom /(•, b, t) 7^ for all 6 near 6. Hence b £ int (dom p + A(\ev ^(t))) . 
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Proof of Theorem 15.21 



Part 1. First note that (5.1c) is equivalent to the optimality condition 



£ -A T dp{b - Ax) +dS(x\ lev^(r)) 



(8.13) 



for the problem fP\ a nd hence by [28] , Theorem 23.8], x solves fP~\ Moreover, by |28| 
Theorem 23.5], (5.1c| is equivalent to 



or, equivalently, 



b — Ax G dp*(u), xedS*(^A T u lev^fY) 
b£ dp*(u) + AdS*(A T u lev^(r)") , 



(8.14) 



which by 28 Theorem 23.8] implies that u solves the reduced dual V r 



Part 2. If x solves [P] then 

Qed[p{b-A{.))+8{-\\e^{r))]{x), 



which by 28, Theorems 23.8, 23.9] and ( 5.1a[ ) is equivalent to (8.13), which in turn is 



equivalent to (5.1c 



Part 3. If u solves V r then 

~P*(-)+6*(a t (-) 



b£d 



levJr) 



(«) 



equivalent to (5.1c 



which by 28 Theorems 23.8, 23.9] and (5.1b) is equivalent to (8.14), which in turn is 



Part 4- The equivalence (5.1c| follows from (5.1d), Part 2 of Lemma 4.3 and the 



fact that A T u £ N (x\ lev^,(r)) if and only if x € 08* (^A T u lev^(r 
To see ( 5.1d| , note that (4.6), (5.1a), and Part 1 of Lemma 



4.4 



imply that the 

primal objective is coercive, so a solution x exists. Hence, by Part 2, there exists u so 



that (x,u) satisfies (5.1c 



Analogously, (4.7), (5.1b), and Part 2 of Lemma 4.4 imply that the solution u to 



the dual exists, and so by Part 3, there exists x such that the pair (x,u) satisfies (5.1c) 
In cither case, the subdifferential is nonempty and is given by (5. Id). 



Proof of Lemma 6.1, Formula (6.3a) is just 28 Theorem 14.5]. The first 
equation in (6.3b) is obvious and the second follows from (6.3a) and the definition of 



the barrier cone. The formula (6.3c) is now obvious. Formulas (6.3d) and (6.3el follow 



immediately from the definitions and 28 Corollary 8.3.3]. Formula (6.3f) follows from 



(6.3e), Corollary 14.2.1], and 28 Corollary 16.4.2] 



First note that (6.4) implies that ri(cone(£/)) D ri (X) ^ 0. Hence, the formula 



(6.5a) follows from 28 Theorem 16.4] and (6.3c). To see (6.5b), observe that the 
expression on the RHS is again an infimal convolution for which inf = min for the 
same reason as for (6.5a). The equivalence with (</>*) 7r (z, p) follows from the calculus 
rules in section 



3.3 For formula (6.5d), first note that 



inf [ rf j, + {(p*Y(z, p)} = inf [rp + inf [6* (z - s \ X ) + S (s \ pU°)}} 

fi>0 M>0 s 



= ird[5 (z-s\X 

S 

= M[S*(z- s\X) +r7(s 



M[Tp + S(s\pU°)}} 



U°)}. 
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Again, the final infimum in this derivation is an infimal convolution for which inf 
for the same reasons as in ( |6.5a[ ) since, by ( |6.3c ) and [28j Theorem 14.5], 



dom((T 7 (- | U°))*) = dom ((£* (• | tU))*) = dom<5(- | tU)=tU. 
Therefore, an optimal s in this infimal convolution exists giving fl = 7 (s \ Uj as the 



optimal solution to the first min in (6.5d) 



23. 



Formula (6.5e| is an immediate consequence of ( 6.3d ), (6.4), and [28} Corollary 
U]. 



Proof of Theorem 6.2, By (6.3d) and the calculus rules for the relative inte- 



rior [28"{ Section 6], (5.1a) and (6.8) are equivalent. Similary, by (6.3f ) and 28 Theorem 



6.3], (5.1b) and (6.9) are equivalent. 

Part 1. Since (6.4) holds, the formula (6.5e) holds and so (6.10) and (5.1c) are 



equivalent. Hence, the result follows from Part 1 of Theorem |5. 2 



Part 2. Since (5.1a) and (6.8) are equivalent, the result follows from Part 2 of 
Theorem [521 



Part 3. Since (5.1b) and (6.9) are equivalent, the result follows from Part 3 of 
Theorem [521 

Part 4. By pel, (16. Ill) is equivalent to p~6|) and (|5.1a 



is equivalent to (4.7) and (5.1b). Therefore, by Theorem 5.2 (6.13) is equivalent to 
(5.1d) since T7 (s | £7°) = inf^xj [t/i + 8 [s | /it/°). The final equivalence is identical 



and, by (6.3c), (6.12| 



to that of Theorem 15.2 



Proof of Theorem |6.3[ The formula for doni(/> follows from (6.17). Indeed, 
dom^ if and only if there exists s £ M. k such that x — Ls £ 
cone (U°), or equivalently, x £ cone (U°) + Ran (L) = cone (U°) + 



by ([6.17P, x 

U° 



dom 7 

Ran(i?). The formula for hzn(</>) follows immediately from [28| Theorem 14.2] and 
( 6.16[ ). In particular, (j> is coercive if and only if {0} = hzn (</>), or equivalently 
cone (U) = R n , i.e., £ int (U). 



Next we show that the A given in (6.20) solves (6.18). First observe that the 



optimal A must be greater than 7 (u; | U), and from elementary calculus, the minimizer 
of the hyperbola + t\ for A > is given by \\w\\ b /V2t. Therefore, the 

minimizing A is given by (6.20). Substituting this value into (6.18) gives (6.19). 



It is now easily shown that the function in (6.19) is lower semi-continuous. There- 



fore, the equivalence in (6.18) follows from (4.2) 



Proof of Theorem [0[ By [28) Theorem 7.6], 

ri (lev^r)) = { x \ x £ ri (dome/)) , <p(x) < r } . 



Hence, by Lemma 6.3 the equivalence between (5.1) and (6.21), (6.22), (6.24), (6.25) 



respectively, is easily seen. Therefore, Parts 1-4 follow immediately from Theorem 1 5. 2 



6.5 
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Proof of Corollary 

N(x \ domcj)). When Ji > 0, by 

\ (w, Bw)\, so that w £ d<p(x) if and only if x £ Bw + N (w\U 



Condition (6.27a) occurs when fi = since d<j)(x) 
Theorem 23.5], d<j>(x) — arg max^gy [(x, w) 
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